Problem Set #7
Overview:
In this problem set, you will be using the stringr package (part of tidyverse) to work with strings, and the lubridate package for working with dates and times. We will ask you to load Twitter data that is saved as an .Rdata file.
Question 1: Working with strings
- Load the following packages in the code chunk below:
tidyverseandlubridate.
Using
str_c()and the following objects as input, create the string:"Roses are red, Violets are blue"- We encourage you to first sketch out what you want to do on some scratch paper.
- Recall from the lecture example on “Using str_c() on vectors of different lengths”, when multiple vectors of different length are provided in the str_c() function, the elements of shorter vectors are recycled. See below.
- Now try it yourself.
Code
[1] "Roses are red, Violets are blue"
Pig Latin is a language game in which the first consonant of each word is moved to the end of the word, then
"ay"is appended to create a suffix. For example, the word"Wikipedia"would become"Ikipediaway".- Using
str_c()andstr_sub(), turn the givenpig_latinvector into the string:"igpay atinlay" - We encourage you to first sketch out what you want to do on some scratch paper.
- First, think about what the final outcome will look like.
- Then, think about how you can get there. Play around with the
str_sub()function. What happens when you include different values in thestr_sub()function?
- this is low-key the trickiest question in the problem set. So if you get stuck, ask a question to your group or github and move on. and come back to it later.
- Using
Code
[1] "igpay atinlay"
Using
str_c()andstr_sub(), decode the givensecret_message. Your output should be a string.- Follow the same logic from above.
- Sketch out what you want to do on some scratch paper. Break it down step by step. Play around with different values for the
str_sub()function.
Question 2: Working with Twitter data
You will be using Twitter data we fetched from the following Twitter handles:
UniNoticias,FoxNews, andCNN.- This data has been saved as an Rdata file.
- Use the
load()andurl()functions to download thenews_dfdataframe from the url:https://github.com/emoriebeck/psc290-data-FQ23/raw/main/05-assignments/07-ps7/twitter_news.RData - Report the dimensions of the
news_dfdata frame (rows and columns). Use thedim()function.
Code
[1] 1000 90
Subset your dataframe
news_dfand create a new dataframe callednews_df2keeping only the following variables:user_id,status_id,created_at,screen_name,text,followers_count,profile_expanded_url.- Note in the following questions we will ask you to create a new column and that means you have to assign
<-the new changes you are making to the existing dataframenews_df2. Ex.news_df2 <- news_df %>% mutate(newvar = mean(oldvar))
- Note in the following questions we will ask you to create a new column and that means you have to assign
Code
Create a new column in
news_df2calledtext_lenthat contains the length of the character variabletext.- What is the
classandtypeof this new column? Make sure to include your code in the code chunk below.- ANSWER: Class and type are equal to “integer”
- What is the
Code
[1] "integer"
[1] "integer"
Create an additional column in
news_df2calledhandle_followersthat stores the twitter handle and the number of followers associated with that twitter handle in a string. For example, the entries in thehandle_followerscolumn should look like this:@[twitter_handle] has [number] followers.- What is the
classandtypeof this new column? Make sure to include your code in the code chunk below.- ANSWER: Class and type are equal to “character”
- What is the
Code
[1] "character"
[1] "character"
- Lastly, create a column in
news_df2calledshort_webthat contains a short version of theprofile_expanded_urlwithout thehttp://www.part of the url. For example, the entries in that column should look something like this:nytimes.com.
Question 3: Working with dates/times
Using the column
created_at, create a new column innews_df2calleddt_chrthat is a character version ofcreated_at.- What is the
classof thecreated_atanddt_chrcolumns? Make sure to include your code in the code chunk below.- ANSWER:
created_atis of class “POSIXct” “POSIXt”, anddt_chris of class “character”.
- ANSWER:
- What is the
Code
[1] "POSIXct" "POSIXt"
[1] "character"
- Create another column in
news_df2calleddt_lenthat stores the length ofdt_chr.
Next, create additional columns in
news_df2for each of the following date/time components:- Create a new column
date_chrfor date (e.g.2020-03-26) using the columndt_chrand thestr_sub()function. - Do the same for year
yr_chr(e.g.2020). - Do the same for month
mth_chr(e.g.03). - Do the same for day
day_chr(e.g.26). - Do the same for time
time_chr(e.g.22:41:09).
- Create a new column
Code
news_df2 <- news_df2 |>
dplyr::mutate(
date_chr = stringr::str_sub(dt_chr, start = 1, end = 10),
yr_chr = stringr::str_sub(dt_chr, start = 1, end = 4),
mth_chr = stringr::str_sub(dt_chr, start = 6, end = 7),
day_chr = stringr::str_sub(dt_chr, start = 9, end = 10),
time_chr = stringr::str_sub(dt_chr, start = 12)
)
news_df2 |>
dplyr::select(dt_chr, date_chr, yr_chr, mth_chr, day_chr, time_chr)Using the column we created in the previous question
time_chr, create additional columns innews_df2for the following time components:- Create a new column
hr_chrfor hour (e.g.22) using the columntime_chrand thestr_sub()function. - Do the same for minutes
min_chr(e.g.41). - Do the same for seconds
sec_chr(e.g.09).
- Create a new column
Code
Now let’s get some practice with the
lubridatepackage.- Using the
year()function from thelubridatepackage, create a new column innews_df2calledyr_numthat contains the year (e.g.2020) extracted fromdate_chr. - Do the same for month
mth_num. - Do the same for day
day_num. - Do the same for hour
hr_num, but extract fromcreated_atcolumn instead ofdate_chr. - Do the same for minutes
min_num. - Do the same for seconds
sec_num.
- Using the
Code
Using the new numeric columns (e.g. day_num, mth_num) you’ve created in the previous step, reconstruct the date and datetime columns. Namely, add the following columns to
news_df2:- Use
make_date()to create new column calledmy_datethat contains the date (year, month, day). - Use
make_datetime()to create new column calledmy_datetimethat contains the datetime (year, month, day, hour, minutes, seconds).
- What is the
classof yourmy_dateandmy_datetimecolumns? Make sure to include your code in the code chunk below.- ANSWER:
- Use
Render to html and submit problem set
Render to html by clicking the “Render” button near the top of your RStudio window (icon with blue arrow)
- Go to the Canvas –> Assignments –> Problem Set 7
- Submit both .qmd and .html files
- Use this naming convention “lastname_firstname_ps#” for your .qmd and html files (e.g. beck_emorie_ps7.qmd & beck_emorie_ps7.html)